Combining Strategies for Extracting Relations from Text Collections
نویسندگان
چکیده
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. Our Snowball system extracts these relations from document collections starting with only a handful of user-provided example tuples. Based on these tuples, Snowball generates patterns that are used, in turn, to find more tuples. In this paper we introduce a new pattern and tuple generation scheme for Snowball, with different strengths and weaknesses than those of our original system. We also show preliminary results on how we can combine the two versions of Snowball to extract tuples more accurately.
منابع مشابه
Facilitating Image Exploratory Search with Relations
Traditional text-based image retrieval does not fully support queries on semantic relationships between two entities. To better help the exploratory search on image collections, this paper presents a system for automatically extracting the relations between entities by analyzing the sentence dependency on the descriptions of the images. Our results demonstrate that using the extracted relations...
متن کاملKELVIN: Extracting Knowledge from Large Text Collections
We describe the KELVIN system for extracting entities and relations from large text collections and its use in the TAC Knowledge Base Population Cold Start task run by the U.S. National Institute of Standards and Technology. The Cold Start task starts with an empty knowledge base defined by an ontology of entity types, properties and relations. Evaluations in 2012 and 2013 were done using a col...
متن کاملA hybrid approach for extracting semantic relations from texts
We present an approach for extracting relations from texts that exploits linguistic and empirical strategies, by means of a pipeline method involving a parser, partof-speech tagger, named entity recognition system, pattern-based classification and word sense disambiguation models, and resources such as ontology, knowledge base and lexical databases. The relations extracted can be used for vario...
متن کاملInducing hyperlinking rules in text collections
Automatic hyperlinking methods based on Information Extraction techniques and on linking rules firing on salient facts have been proposed to connect documents with “typed” relations. However, the activity of defining link types and writing linking rules may be cumbersome due to the large number of possibilities. In this paper, we tackle this issue proposing a model for automatically extracting ...
متن کاملText-based Knowledge Acquisition for Ontology Engineering
This paper describes an approach towards ontology engineering that makes use of text technology for extracting relevant semantic relations from document collections. A short description of corpus characteristics and examples of statistical text analysis results show how input for ontology design can be generated automatically. The Topic Map standard is used as an example for standardised repres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000